Static and Dynamic Big Data Partitioning on Apache Spark

نویسندگان

Massimiliano Bertolucci

Emanuele Carlini

Patrizio Dazzi

Alessandro Lulli

Laura Ricci

چکیده

Many of today’s large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The comparison effects of eight weeks spark and frenkel exercises on static and dynamic balance in the blinds

Introduction: One of the most important human senses is vision, which its loss is causing many primary and secondary complications for physical and psychological health such as difficulties in static and dynamic balance. This study aimed to compare the effect of 8 weeks of Spark and Frenkel exercises training on the static and dynamic balance in blind people. ...

متن کامل

The STARK Framework for Spatio-Temporal Data Analytics on Spark

Big Data sets can contain all types of information: from server log files to tracking information of mobile users with their location at a point in time. Apache Spark has been widely accepted for Big Data analytics because of its very fast processing model. However, Spark has no native support for spatial or spatio-temporal data. Spatial filters or joins using, e.g., a contains predicate are no...

متن کامل

Efficient spatio-temporal event processing with STARK

For Big Data processing, Apache Spark has been widely accepted. However, when dealing with events or any other spatio-temporal data sets, Spark becomes very inefficient as it does not include any spatial or temporal data types and operators. In this paper we demonstrate our STARK project that adds the required data types and operators, such as spatio-temporal filter and join with various predic...

متن کامل

Dynamic Multi-Objective Optimization with jMetal and Spark: A Case Study

Technologies for Big Data and Data Science are receiving increasing research interest nowadays. This paper introduces the prototyping architecture of a tool aimed to solve Big Data Optimization problems. Our tool combines the jMetal framework for multi-objective optimization with Apache Spark, a technology that is gaining momentum. In particular, we make use of the streaming facilities of Spark...

متن کامل

An Adaptive Partitioning Scheme for Ad-hoc and Time-varying Database Analytics

Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. F...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Static and Dynamic Big Data Partitioning on Apache Spark

نویسندگان

چکیده

منابع مشابه

The comparison effects of eight weeks spark and frenkel exercises on static and dynamic balance in the blinds

The STARK Framework for Spatio-Temporal Data Analytics on Spark

Efficient spatio-temporal event processing with STARK

Dynamic Multi-Objective Optimization with jMetal and Spark: A Case Study

An Adaptive Partitioning Scheme for Ad-hoc and Time-varying Database Analytics

عنوان ژورنال:

اشتراک گذاری